Center for Research Computing, University of Notre Dame
2024-01-19
[Allen, Bradley P, Lise Stork, and Paul Groth. 2023. “Knowledge Engineering Using Large Language Models.” arXiv.Org. October 1, 2023. https://arxiv.org/abs/2310.00637]
Indexing
Data Indexing: Cleaning and extracting data from PDF, HTML, Word, Markdown, Images
Chunking: Dividing text into smaller chunks for LLM limited context window
Embedding and Creating Index: Encoding text/images into vectors through a language model
Retrieve: Given a user input, retrieve relevant information
Generation: The user query to the LLM and related documents from retrieval are combined into a new prompt. The LLM generates a response based on this new context window.